Colin Grudzien1, Marc Bocquet2
Eric Olson1 & Mihye Ahn1 for computing logistics and support
Iterative, ensemble-variational methods form the basis for the state-of-the-art for nonlinear, scalable data assimilation (DA)1
However, a barrier to the use of iterative, ensemble-variational schemes in online, sequential DA still lies in the computational bottleneck of simulating the nonlinear, physics-based model to perform the sampling procedure.
Many iterative smoothing procedures require multiple runs of the ensemble forecast to produce the forecast, filtering and re-analyzed smoothing statistics.
For online applications in which there is a short operational time-window to produce a forecast for future states, the iterative ensemble simulations may be prohibitively expensive.
Particularly in case where nonlinearity in the DA cycle is not dominated by the error dynamics in the ensemble forecast,
the cost of an iterative, ensemble forecast solution may not be justified by the improvement in forecast accuracy.
Nonlinearity in the DA cycle may instead by dominated by:
Particularly for sequential forecast cycles in which the linear-Gaussian approximation of the forecast error may be appropriate, like synoptic meteorology,
We consider a simple hybridization of the classic ensemble transform Kalman smoother (ETKS) with the iterative ensemble Kalman smoother (IEnKS) to produce a single-iteration, fixed-lag, sequential smoother.
We combine:
to improve the EnKF filter and forecast statistics in the subsequent DA cycle.
The resulting scheme is a “single-iteration”, ensemble Kalman smoother that sequentially solves the filtering, Bayesian MAP cost-function;
This scheme seeks to minimize the leading order cost of ensemble-variational smoothing, i.e., the ensemble forecast over the DAW.
This is an outer-loop optimization of the cost of fixed-lag, sequential smoothing, designed for online applications with weakly nonlinear forecast error dynamics.
We will denote the general scheme a “single-iteration” smoother, while the specific implementation presented here is denoted the single-iteration ensemble transform Kalman smoother (SIETKS).
For linear-Gaussian systems with the perfect model hypothesis, the SIETKS is a fully consistent Bayesian estimator, albeit one that uses redundant model simulations.
When the forecast error dynamics are weakly nonlinear, yet other aspects of the DA cycle are strongly nonlinear,
we demonstrate the SIETKS has predictive performance comparable with fully iterative methods.
We furthermore derive a method of multiple data assimilation (MDA) for this hybrid smoother scheme;
The result is a two-stage algorithm, estimating the forecast, filtering and re-analyzed smoothing, and shifting the smoother forward in time with two sweeps of the lagged states with an ensemble forecast.
With Gaussian error densities,
\[ \begin{align} p(\pmb{x}_L\vert \pmb{y}_{1:L-1}) = N\left(\overline{\pmb{x}}_L^\mathrm{fore}, \mathbf{B}_L^\mathrm{fore}\right) & & p\left(\pmb{y}_L \vert \pmb{x}_L \right) = N\left(\mathbf{H}_L \pmb{x}^\mathrm{fore}_L, \mathbf{R}_L\right) \end{align} \] this can be written as a least-squares optimization as follows.
Suppose that an ensemble \( \mathbf{E}_{L}\in \mathbb{R}^{N_x \times N_e} \) is drawn iid from \( p(\pmb{x}_L\vert \pmb{y}_{1:L-1}) \).
Define the ensemble-based, empirical estimates,
\[ \begin{align} & & \hat{\pmb{x}}_L^\mathrm{fore} &= \mathbf{E}_L^\mathrm{fore} \pmb{1} / N_e ; & \hat{\pmb{\delta}}_L &= \mathbf{R}_k^{-\frac{1}{2}}\left(\pmb{y}_L - \mathbf{H}_L \hat{\pmb{x}}_L\right)\\ &&\mathbf{X}_L^\mathrm{fore} &= \mathbf{E}_L^\mathrm{fore} - \hat{\pmb{x}}^\mathrm{fore}_L \pmb{1}^\top ; & \mathbf{P}^\mathrm{fore}_L &= \mathbf{X}_L^\mathrm{fore} \left(\mathbf{X}_L^\mathrm{fore}\right)^\top / \left(N_e - 1\right);\\ & &\mathbf{S}_L &:=\mathbf{R}_L^{-\frac{1}{2}}\mathbf{H}_L \mathbf{X}_L^\mathrm{fore}. \end{align} \]
Maximizing the empirical posterior density is thus equivalent to minimizing the cost function
\[ \begin{alignat}{2} & & {\color{#d95f02} {\widetilde{\mathcal{J}} (\pmb{w})} } &= {\color{#1b9e77} {\frac{1}{2} \parallel \hat{\pmb{x}}_L^\mathrm{fore} - \mathbf{X}^\mathrm{fore}_L \pmb{w}- \hat{\pmb{x}}^\mathrm{fore}_L \parallel_{\mathbf{P}^\mathrm{fore}_L}^2} } + {\color{#7570b3} {\frac{1}{2} \parallel \pmb{y}_L - \mathbf{H}_L\hat{\pmb{x}}^\mathrm{fore}_L - \mathbf{H}_L \mathbf{X}^\mathrm{fore}_L \pmb{w} \parallel_{\mathbf{R}^{-1}_L}^2 } }\\ \Leftrightarrow& & {\color{#d95f02} {\widetilde{\mathcal{J}}(\pmb{w})} } &= {\color{#1b9e77} {\frac{1}{2}(N_e - 1) \parallel\pmb{w}\parallel^2} } + {\color{#7570b3} {\frac{1}{2}\parallel \hat{\pmb{\delta}}_L - \mathbf{S}_L \pmb{w}\parallel^2 } } \end{alignat} \] which is an optimization over a weight vector \( \pmb{w} \) in the ensemble dimension \( N_e \) rather than in the state space dimension \( N_x \).
Setting the gradient \( \nabla_{\pmb{w}} \widetilde{\mathcal{J}} \) equal to zero for \( \overline{\pmb{w}} \) we find
\[ \begin{align} \pmb{w} = \pmb{0} - {\widetilde{\mathbf{H}}_{\widetilde{\mathcal{J}}}}^{-1} \nabla_{\pmb{w}} \widetilde{\mathcal{J}}. \end{align} \] where \( \widetilde{\mathbf{H}}_{\widetilde{\mathcal{J}}}:= \nabla^2_{\pmb{w}} \) is the Hessian of the cost function.
If we define a right-transform matrix,
\[ \begin{align} \mathbf{T}:= \widetilde{\mathbf{H}}_{\widetilde{\mathcal{J}}}^{-\frac{1}{2}} \end{align} \]
we similarly have the update for the covariance as
\[ \begin{align} \mathbf{P}^\mathrm{filt}_L = \left(\mathbf{X}_L^\mathrm{fore} \mathbf{T} \right)\left(\mathbf{X}_L^\mathrm{fore} \mathbf{T} \right)^\top /(N_e - 1). \end{align} \]
The numerical cost of the Newton step and the transform \( \mathbf{T} \) are subordinate to the cost of the eigen value decomposition of \( \mathbf{H}_{\mathcal{J}} \in \mathbb{R}^{N_e \times N_e} \).
This sketches the right ensemble transform Kalman filter recursion as in the derivation of the local ensemble transform Kalman filter (LETKF).
In the ETKF formalism, we can define an ensemble right-transform \( \boldsymbol{\Psi}_k \) such that for any \( t_k \),
\[ \begin{align} \mathbf{E}^\mathrm{filt}_k = \mathbf{E}^\mathrm{fore}_k \boldsymbol{\Psi}_k \end{align} \] where in the above we would say that \[ \begin{align} \mathbf{E}^\mathrm{filt}_k &\sim p(\pmb{x}_k \vert \pmb{y}_{1:k}) \\ \mathbf{E}^\mathrm{fore}_k &\sim p(\pmb{x}_k \vert \pmb{y}_{1:k-1}) \end{align} \]
We will associate \( \mathbf{E}^\mathrm{filt}_k \equiv \mathbf{E}^\mathrm{post}_{k|k} \);
\[ \begin{align} \mathbf{E}^\mathrm{post}_{k |L} = \mathbf{E}^\mathrm{post}_{k|L-1}\boldsymbol{\Psi}_{L} & & \mathbf{E}^\mathrm{post}_{k|K} \sim p(\pmb{x}_k \vert \pmb{y}_{1:K}). \end{align} \]
Then we can perform a retrospective smoothing analysis on all past states stored in memory by using the latest right-transform update from the filtering step.
This form of retrospective analysis is the basis of the ensemble transform Kalman smoother (ETKS)6.
The information in the posterior estimate thus flows in reverse time to the lagged states stored in memory, but the information flow is unidirectional in this scheme.
The improved posterior estimate for the lagged states does not improve the filtering estimate in the perfect, linear-Gaussian model:
\[ \begin{align} \mathbf{M}_{k} \mathbf{E}_{k-1|L}^\mathrm{post}& = \mathbf{M}_{k} \mathbf{E}_{k-1|k-1}^\mathrm{post} \prod_{j=k}^{L} \boldsymbol{\Psi}_j \\ & =\mathbf{E}_{k|k}^\mathrm{post}\prod_{j=k+1}^L \boldsymbol{\Psi}_j\\ & = \mathbf{E}_{k|L}. \end{align} \]
This demonstrates that the effect of the conditioning on the information from the observation is covariant with the dynamics.
In fact, in the case of the perfect, linear-Gaussian model, the backward-in-time conditioning is equivalent to the reverse time model forecast \( \mathbf{M}_k^{-1} \), as demonstrated by Rauch, Tung and Striebel (RTS).7
The covariance of the conditioning and the model dynamics does not hold either in the case of nonlinear dynamics or model error.
Re-initializing the DA cycle in a perfect, nonlinear model with the smoothed conditional ensemble estimate \( \mathbf{E}^\mathrm{post}_{0|L} \) can dramatically improve the performance of the subsequent forecast and filtering statistics.
Let us denote the composition of the model forecast as,
\[ \begin{align} \mathcal{M}_{k:m} : = \mathcal{M}_k \circ \cdots \circ \mathcal{M}_{m} & & \mathbf{M}_{k:m} := \mathbf{M}_{k}\cdots \mathbf{M}_{m} \end{align} \]
Particularly, this exploits the miss-match in perfect, nonlinear dynamics between
\[ \begin{align} \mathcal{M}_{L:1}\left(\mathbf{E}_{0|L}^\mathrm{post}\right):= \mathcal{M}_L \circ \cdots \circ \mathcal{M}_1\left(\mathbf{E}_{0|L}^\mathrm{post}\right) \neq \mathbf{E}_{L}^\mathrm{filt}. \end{align} \]
The effectiveness of the linear-Gaussian approximation strongly depends on the length of the forecast window \( \Delta t \);
If the dynamics for the evolution of the densities are weakly nonlinear, re-initializing the model forecast with the posterior estimate (under the linear-Gaussian cost function) can bring new information into the forecast states in the next DAW.
This has been exploited to a great extent by utilizing the 4D-cost function, in which the filtering MAP cost function is extended over multiple observations simultaneously, and in terms of a lagged state directly.
Suppose now we want to write \( p(\pmb{x}_{1:L}\vert \pmb{y}_{1:L}) \), the joint smoothing posterior over the current DAW, as a recursive update to the last smoothing posterior.
We will suppose that there is an arbitrary shift \( S\geq 1 \).
Using a Bayesian analysis like before, we can write
\[ \begin{align} p(\pmb{x}_{1:L} \vert \pmb{y}_{0:L}) &\propto \int \mathrm{d}\pmb{x}_0 \underbrace{p(\pmb{x}_0 \vert \pmb{y}_{0:L-1})}_{(1)} \underbrace{\left[\prod_{k=L-S+1}^L p(\pmb{y}_k \vert \pmb{x}_k) \right]}_{(2)} \underbrace{\left[\prod_{k=1}^L \pmb{\delta}\left\{\pmb{x}_k - \mathcal{M}_{k} \left(\pmb{x}_{k-1}\right) \right\} \right]}_{(3)} \end{align} \] where
Using the fact that \( p(\pmb{x}_{1:L} \vert \pmb{y}_{1:L} ) \propto p(\pmb{x}_{1:L} \vert \pmb{y}_{0:L}) \), we can chain together a recursive fixed-lag smoother, sequentially across DAWs.
Under the linear-Gaussian assumption, the resulting cost function takes the form
\[ \begin{alignat}{2} & & {\color{#d95f02} {\widetilde{\mathcal{J}} (\pmb{w})} } &= {\color{#d95f02} {\frac{1}{2} \parallel \hat{\pmb{x}}_0^\mathrm{post} - \mathbf{X}^\mathrm{post}_0 \pmb{w}- \hat{\pmb{x}}^\mathrm{post}_0 \parallel_{\mathbf{P}^\mathrm{post}_{0|L-1}}^2} } + {\color{#7570b3} {\sum_{k=L-S+1}^L \frac{1}{2} \parallel \pmb{y}_k - \mathbf{H}_k {\color{#1b9e77} { \mathbf{M}_{k:1}} }\hat{\pmb{x}}^\mathrm{fore}_0 - \mathbf{H}_k {\color{#1b9e77} {\mathbf{M}_{k:1}}} \mathbf{X}^\mathrm{fore}_0 \pmb{w} \parallel_{\mathbf{R}^{-1}_L}^2 } } \\ \Leftrightarrow& & \widetilde{\mathcal{J}}(\pmb{w}) &= \frac{1}{2}(N_e - 1) \parallel\pmb{w}\parallel^2 + \sum_{k=L-S+1}^L \frac{1}{2}\parallel \hat{\pmb{\delta}}_k - \mathbf{S}_k \pmb{w}\parallel^2 \end{alignat} \] where the weights \( \pmb{w} \) give the update to the initial state \( \pmb{x}_{0|L-1} \).
In the linear-Gaussian case, the solution can again be found by a single iteration of Newton's descent for the cost function above,
\[ \begin{align} \pmb{w} &= 0 - \mathbf{H}_{\widetilde{\mathcal{J}}}^{-1}\nabla_{\pmb{w}} \widetilde{\mathcal{J}}\\ \mathbf{T} &= \mathbf{H}_{\widetilde{\mathcal{J}}}^{-\frac{1}{2}};\\ \mathbf{P}_{0|L}^\mathrm{post} &= \left(\mathbf{X}^\mathrm{post}_{0|L-1}\mathbf{T}\right)\left(\mathbf{X}^\mathrm{post}_{0|L-1}\mathbf{T}\right)^\top / \left(N_e - 1\right). \end{align} \]
When the state and observation model are nonlinear, using \( \mathcal{H}_k \) and \( \mathcal{M}_{k:1} \) in the cost function, this cost function can be solved iteratively to find local minimum with a recursive Newton step as on the last slide.
This approach is at the basis of the iterative ensemble Kalman smoother (IEnKS)8, which produces a more accurate solution than the linear approximation when the models are highly nonlinear;
Forming a Gaussian mixture model increases the variance of the estimator and reduces the bias of the linear-Gaussian approximation;
While accuracy can increase with iterations, every iteration of the 4D cost function comes at the cost of the ensemble forecast.
In synoptic meteorology, e.g., \( \Delta t \approx 6 \) hours, the linear-Gaussian approximation of the evolution of the densities is actually an adequate approximation;
However, the iterative optimization over hyper-parameters in the filtering step of the classical ETKS can be run without the additional cost of model forecasts.
Iterative optimization of the filtering cost function expands in the cost of the eigen value decomposition of \( \mathbf{H}_{\mathcal{J}} \in \mathbb{R}^{N_e \times N_e} \).
Subsequently, the retrospective analysis in the form of the filtering right-transform can be applied to condition the initial ensemble
\[ \begin{align} \mathbf{E}^\mathrm{post}_{0|L} = \mathbf{E}_{0:L-1}^\mathrm{post} \boldsymbol{\Psi}_L \end{align} \]
Like in the IEnKS, we initialize the next DA cycle in terms of the retrospective analysis, and gain the benefit of the improved initial estimate.
Courtesy of: Kaidor via Wikimedia Commons (CC 3.0)
The single-iteration ensemble transform Kalman smoother (SIETKS) is an outer loop re-organization of the filtering steps of other smoothing schemes.
This uses the lagged, marginal posterior of 4D iterative schemes, but iterates in the filtering step of the classical EnKS.
The SIETKS sequentially solves the filtering cost function instead of the 4D cost function.
A key question is if this can be used to iteratively solve for the localization hyper-parameters that may be used in an operational forecasting system.
Additionally, we are working to extend the method to the case of stochastic and parametric model error.